Version: 1.2.3

Chat LLM Proxy API

Overview

The Chat LLM Proxy project provides a set of APIs for interacting with various language models. This documentation outlines the available endpoints, request and response formats, and example usage to help developers integrate with the API effectively.

API Endpoints
Data Models

API Endpoints

Generate Answer

POST `/api/generate_answer`

Overview

The generate_answer API endpoint is responsible for generating a response based on the provided input parameters, utilizing various language models. It processes the request and returns a structured response containing the generated answer along with relevant metadata.

Request Parameters

Parameter	Type	Description
`request`	`GenerateAnswerRequest`	The request object containing user input and configuration for generating the answer.
`is_session`	`bool`	Indicates whether the request is part of an ongoing session.
`model_info`	`dict`	A dictionary containing information about the model to be used for generating the answer.
`prompt`	`str`	The prompt or question for which the answer is to be generated.
`tool_defns`	`list`	A list of tool definitions that may be used in the answer generation process.
`all_tools`	`dict`	A dictionary containing all available tools for the request.
`history_prompt`	`list`	A list of previous prompts or messages in the conversation history.
`query`	`str`	The query string that may influence the answer generation.
`gen_search_text`	`str`	The generated search text that may be included in the response.
`generated_chat_id`	`int`	A unique identifier for the generated chat session.
`api_key`	`str`	The API key for authentication and authorization purposes.

Response Format

The response from the generate_answer API is a JSON object containing the following fields:

Field	Type	Description
`response`	`str`	The generated answer based on the input prompt.
`generated_search_text`	`str`	The search text generated during the processing of the request.
`finish_reason`	`str`	Indicates the reason for the completion of the response generation (e.g., "stop", "length").

Example Usage

Request

{
    "request": {
        "user_cred": {
            "token": "user_token",
            "client": {
                "tenant_id": "tenant_id"
            }
        },
        "task_process": {
            "service": "service_name"
        },
        "user_chat": {
            "query": "What is the capital of France?",
            "kvp": {}
        }
    },
    "is_session": true,
    "model_info": {
        "modelId": "azure",
        "modelVersion": "v1",
        "temperature": 0.7,
        "max_tokens": 150
    },
    "prompt": "What is the capital of France?",
    "tool_defns": [],
    "all_tools": {},
    "history_prompt": [],
    "query": "What is the capital of France?",
    "gen_search_text": "",
    "generated_chat_id": 12345,
    "api_key": "your_api_key"
}

Response

{
    "response": "The capital of France is Paris.",
    "generated_search_text": "",
    "finish_reason": "stop"
}

Get Sources

POST `/api/get_sources`

Overview

The get_sources function retrieves various sources of information based on the user's request and bot details. It processes the input request and returns a structured dictionary containing relevant data.

Request Parameters

The function accepts the following parameters:

request (GenerateAnswerRequest): An object containing user request details, including user credentials and task process information.
bot_details (dict): A dictionary containing details about the bot, including llm_data, system_instruction_cache_key, and num_new_uploads.
chat_hist (list): A list of previous chat messages that may influence the current request.

Response Format

The function returns a tuple containing:

sources (dict): A dictionary with the following keys:
- gen_search_text: The generated search text based on the context.
- model_info: Information about the model being used.
- examples: Example responses from the bot.
- query: The processed query string.
- all_input_texts: A dictionary containing filtered input texts.
- llm_data: The data related to the language model.
- history_prompt: The prompt history for the conversation.
- num_new_uploads: The number of new uploads associated with the request.
bool: A boolean indicating the success or failure of the operation.

Example Usage

from app.controller.chat_classes import GenerateAnswerRequest

# Create a request object
request = GenerateAnswerRequest(
    user_cred=user_credentials,
    task_process=task_process_info,
    bot_config=bot_configuration,
    user_chat=user_chat_info,
    prev_context=previous_context
)

# Bot details
bot_details = {
    "llm_data": llm_data,
    "system_instruction_cache_key": "some_cache_key",
    "num_new_uploads": 2
}

# Chat history
chat_hist = [
    {"user_query": "What is the weather today?"},
    {"user_query": "Tell me about the news."}
]

# Call the get_sources function
sources, success = get_sources(request, bot_details, chat_hist)

# Output the sources
print(sources)

Handle Continuation Request

Description

The handle_continuation_request function processes incoming queries to determine if a continuation request is being made. Specifically, it checks if the query matches a predefined constant that indicates a request to continue from the last answer.

Parameters

query (str): The incoming message payload to check. This is the user input that may indicate a continuation request.

Returns

str: The processed query. If the input query matches the constant indicating continuation, it returns the string "continue". Otherwise, it returns the original query.

Example Usage

# Example of handling a continuation request
user_query = "$continue_answer"
processed_query = handle_continuation_request(user_query)
print(processed_query)  # Output: "continue"

# Example with a regular query
user_query = "What is the weather today?"
processed_query = handle_continuation_request(user_query)
print(processed_query)  # Output: "What is the weather today?"

Run Concurrently

Purpose

The run_concurrently function is designed to execute two tasks concurrently using a thread pool. It allows for efficient processing of tasks that can be performed simultaneously, improving the overall performance of the application.

Request Parameters

The function accepts the following parameters:

llm_data (dict): A dictionary containing data related to the language model, including any necessary configurations and inputs.
token (str): The authentication token used to access secured resources.
query (str): The query string that will be processed by the language model.
service (str): The service identifier that specifies which language model service to use.

Return Values

The function returns a tuple containing:

search_text_result (str): The result of the semantic search operation.
extracted_text_result (str): The result of the text extraction operation.

Example Usage

llm_data = {
    "context": "Sample context for processing.",
    "upload_files": [],
    # Additional necessary data...
}
token = "your_auth_token"
query = "What is the capital of France?"
service = "example_service"

search_text, extracted_text = run_concurrently(llm_data, token, query, service)

print("Search Text:", search_text)
print("Extracted Text:", extracted_text)

Stream Generate Answer

POST `/api/stream_generate_answer`

Overview

The stream_generate_answer API endpoint is designed to generate answers in a streaming manner based on the provided request parameters. This allows for real-time interaction and response generation, making it suitable for applications that require immediate feedback.

Request Parameters

Parameter	Type	Description
`request`	`GenerateAnswerRequest`	The request object containing user credentials, query, and other necessary information.
`is_session`	`bool`	Indicates whether the request is part of an ongoing session.
`model_info`	`dict`	A dictionary containing information about the model being used, such as model ID and version.
`prompt`	`str`	The prompt to be used for generating the answer.
`tool_defns`	`list`	A list of tool definitions that may be used in the answer generation process.
`all_tools`	`dict`	A dictionary containing all available tools for the request.
`history_prompt`	`list`	A list of previous prompts to provide context for the current request.
`query`	`str`	The query string that the model will respond to.
`gen_search_text`	`str`	The generated search text that may be used in the response.
`generated_chat_id`	`int`	A unique identifier for the generated chat session.
`api_key`	`str`	The API key for authentication purposes.

Response Format

The response from the stream_generate_answer API is a stream of chunks, each containing the following structure:

Field	Type	Description
`generated_search_text`	`str`	The search text generated during the answer generation process.
`response`	`str`	The generated answer from the model.
`finish_reason`	`str`	Indicates the reason for finishing the response generation (e.g., "stop", "length").

Example Usage

Request Example

{
  "request": {
    "user_cred": {
      "token": "your_token_here",
      "client": {
        "tenant_id": "your_tenant_id"
      }
    },
    "user_chat": {
      "query": "What is the capital of France?"
    }
  },
  "is_session": true,
  "model_info": {
    "modelId": "azure",
    "modelVersion": "v1"
  },
  "prompt": "Please provide the capital city of France.",
  "tool_defns": [],
  "all_tools": {},
  "history_prompt": [],
  "query": "What is the capital of France?",
  "gen_search_text": "",
  "generated_chat_id": 12345,
  "api_key": "your_api_key_here"
}

Response Example

{
  "generated_search_text": "The capital of France is Paris.",
  "response": "The capital of France is Paris.",
  "finish_reason": "stop"
}

Request Models Documentation

This document describes the data models used in the API requests for the chat LLM proxy. Each model outlines the structure and types of the request payloads.

GenerateAnswerRequest

Description

The GenerateAnswerRequest model is used to encapsulate the data required to generate an answer from the chat LLM.

Properties

user_cred (UserCred): Contains user credentials.
bot_config (BotConfig): Configuration settings for the bot.
query (string): The input query for which an answer is to be generated.
task_process (TaskProcess): Information about the task being processed.
prev_context (string, optional): Previous context for continuation requests.
kvp (dict): Key-value pairs for additional parameters.

Example

{
  "user_cred": {
    "token": "user_token",
    "client": {
      "tenant_id": "tenant_id"
    }
  },
  "bot_config": {
    "caller_version": "v6",
    "tool_config": null
  },
  "query": "What is the capital of France?",
  "task_process": {
    "service": "chat_service"
  },
  "prev_context": null,
  "kvp": {
    "files": [
      {
        "file_name": "document1.txt",
        "source_category": "InputFile"
      }
    ]
  }
}

Other Models

UserCred

token (string): The authentication token for the user.
client (ClientInfo): Information about the client.

BotConfig

caller_version (string): The version of the bot being called.
tool_config (ToolConfig, optional): Configuration for tools used by the bot.

TaskProcess

service (string): The service being used for the task.

ClientInfo

tenant_id (string): The tenant ID associated with the client.

ToolConfig

(Define properties as needed based on your application requirements)

Overview​

Table of Contents​

API Endpoints​

Generate Answer​

POST /api/generate_answer​

Overview​

Request Parameters​

Response Format​

Example Usage​

Request​

Response​

Get Sources​

POST /api/get_sources​

Overview​

Request Parameters​

Response Format​

Example Usage​

Handle Continuation Request​

Description​

Parameters​

Returns​

Example Usage​

Run Concurrently​

Purpose​

Request Parameters​

Return Values​

Example Usage​

Stream Generate Answer​

POST /api/stream_generate_answer​

Overview​

Request Parameters​

Response Format​

Example Usage​

Request Example​

Response Example​

Request Models Documentation​

GenerateAnswerRequest​

Description​

Properties​

Example​

Other Models​

UserCred​

BotConfig​

TaskProcess​

ClientInfo​

ToolConfig​

Overview

Table of Contents

API Endpoints

Generate Answer

POST `/api/generate_answer`

Overview

Request Parameters

Response Format

Example Usage

Request

Response

Get Sources

POST `/api/get_sources`

Overview

Request Parameters

Response Format

Example Usage

Handle Continuation Request

Description

Parameters

Returns

Example Usage

Run Concurrently

Purpose

Request Parameters

Return Values

Example Usage

Stream Generate Answer

POST `/api/stream_generate_answer`

Overview

Request Parameters

Response Format

Example Usage

Request Example

Response Example

Request Models Documentation

GenerateAnswerRequest

Description

Properties

Example

Other Models

UserCred

BotConfig

TaskProcess

ClientInfo

ToolConfig